Artificial error generation for translation-based grammatical error correction
نویسنده
چکیده
Automated grammatical error correction for language learners has attracted a lot of attention in recent years, especially after a number of shared tasks that have encouraged research in the area. Treating the problem as a translation task from 'incorrect' into 'correct' English using statistical machine translation has emerged as a state-of-the-art approach but it requires vast amounts of corrected parallel data to produce useful results. Because manual annotation of incorrect text is laborious and expensive, we can generate artificial error-annotated data by injecting errors deliberately into correct text and thus produce larger amounts of parallel data with much less effort. In this work, we review previous work on artificial error generation and investigate new approaches using random and probabilistic methods for constrained and general error correction. Our methods use error statistics from a reference corpus of learner writing to generate errors in native text that look realistic and plausible in context. We investigate a number of aspects that can play a part in the error generation process, such as the origin of the native texts, the amount of context used to find suitable insertion points, the type of information encoded by the error patterns and the output error distribution. In addition, we explore the use of linguistic information for characterising errors and train systems using different combinations of real and artificial data. Results of our experiments show that the use of artificial errors can improve system performance when they are used in combination with real learner errors, in line with previous research. These improvements are observed for both constrained and general correction, for which probabilistic methods produce the best results. We also demonstrate that systems trained on a combination of real and artificial errors can beat other highly-engineered systems and be more robust, showing that performance can be improved by focusing on the data rather than tuning system parameters. Part of our work is also devoted to the proposal of the I-measure, a new evaluation scheme that scores corrections in terms of improvement on the original text and solves known issues with existing evaluation measures. To my family and L.A. In memory of Kika.
منابع مشابه
Grammatical Error Correction of English as Foreign Language Learners
This study aimed to discover the insight of error correction by implementing two correction systems on three Iranian university students. The three students were invited to write four in-class essays throughout the semester, in which their verb errors and individual-selected errors were corrected using the Code Correction System and the Individual Correction System. At the end of the study, the...
متن کاملThe Impact of Immediate Grammatical Error Correction on Senior English Majors’ Accuracy at Hebron University
This study aimed at investigating the effects of grammatical error correction on EFL learners’ accuracy. Twenty-two male and female senior students were chosen randomly to respond to a questionnaire investigating their beliefs about immediate grammatical error correction. Actually, the study was conducted in order to answer this question: what is the effect of grammatical error feedback on stu...
متن کاملThe Impact of Immediate Grammatical Error Correction on Senior English Majors’ Accuracy at Hebron University
This study aimed at investigating the effects of grammatical error correction on EFL learners’ accuracy. Twenty-two male and female senior students were chosen randomly to respond to a questionnaire investigating their beliefs about immediate grammatical error correction. Actually, the study was conducted in order to answer this question: what is the effect of grammatical error feedback on stu...
متن کاملNeural Network Translation Models for Grammatical Error Correction
Phrase-based statistical machine translation (SMT) systems have previously been used for the task of grammatical error correction (GEC) to achieve state-of-the-art accuracy. The superiority of SMT systems comes from their ability to learn text transformations from erroneous to corrected text, without explicitly modeling error types. However, phrase-based SMT systems suffer from limitations of d...
متن کاملConnecting the Dots: Towards Human-Level Grammatical Error Correction
We build a grammatical error correction (GEC) system primarily based on the state-of-the-art statistical machine translation (SMT) approach, using task-specific features and tuning, and further enhance it with the modeling power of neural network joint models. The SMT-based system is weak in generalizing beyond patterns seen during training and lacks granularity below the word level. To address...
متن کامل